Learning from Chinese-English Parallel Data for Chinese Tense Prediction
نویسندگان
چکیده
Tense prediction can be useful for many language processing tasks, such as temporal inference and machine translation. In this paper, we investigate using diverse contextual features for Chinese tense prediction under a statistical learning framework. Because of lack of annotated training data, we propose to leverage ChineseEnglish parallel corpora to automatically generate reference tense for model training. We also propose to use an iterative learning framework to deal with the noisy reference data to improve learning. Evaluation is performed using both automatically generated reference data and a manually annotated set with verb tense. Our results demonstrate the effectiveness of our proposed learning framework that maps annotation from one language to another using parallel data. Furthermore, we show better performance using our proposed iterative bootstrapping learning method compared to using the original automatically created training data.
منابع مشابه
Buy one get one free: Distant annotation of Chinese tense, event type and modality
We describe a “distant annotation” method where we mark up the semantic tense, event type, and modality of Chinese events via a word-aligned parallel corpus. We first map Chinese verbs to their English counterparts via word alignment, and then annotate the resulting English text spans with coarse-grained categories for semantic tense, event type, and modality that we believe apply to both Engli...
متن کاملDistant annotation of Chinese tense and modality
In this paper we describe a “distant annotation” method by which we mark up tense and modality of Chinese eventualities via a wordaligned parallel corpus. We first map Chinese verbs to their English counterpart via word alignment, and then annotate the resulting English text spans with coarse-grained tense and modality categories that we believe apply to both English and Chinese. Because Englis...
متن کاملChinese Tense Labelling and Causal Analysis
This paper explores the role of tense information in Chinese causal analysis. Both tasks of causal type classification and causal directionality identification are experimented to show the significant improvement gained from tense features. To automatically extract the tense features, a Chinese tense predictor is proposed. Based on large amount of parallel data, our semisupervised approach impr...
متن کاملExploring Parallel Concordancing in English and Chinese
This paper investigates the value of computer technology as a medium for the delivery of parallel texts in English and Chinese for language learning. An English-Chinese parallel corpus was created for use in parallel concordancing -a technique which has been developed to respond to the desire to study language in its natural contexts of use. Specific problems of dealing with Chinese characters ...
متن کاملCreating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction
This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually trans...
متن کامل